Towards Developing a Multi-Dialect Morphological Analyser for Arabic

نویسندگان

  • Khalid Almeman
  • Mark Lee
چکیده

In this paper we address the problem of the analysis of multi-dialect Arabic morphology. Our method involves based on the synthesis of two methods. The first method is linguistic based, using an adopted Modern Standard Arabic (MSA) Morphology Analyser to first deal with dialect prefixes and suffixes and then analyse the words. This method improves accuracy of dialect words by 69%. The second method involves segmenting the word and then using ‘the web as corpus’ to estimate frequency of different segment combinations which are used to guess the correct base form. The overall synthesis is shown to have 94% accuracy on a corpus of Arabic dialects. Keywords—Morphology Analyser; Multi-Dialect; Web Corpus

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Borrowing the Verb “ast” and Its Varieties in Arabic Dialect of Sarab

“Borrowing” is a lingual process that is studied in diachronic linguistics. In this process a language borrows elements from another language. This process usually occurs in areas that two languages make contact with each other. In a dialect spoken in South Khorasan the language borrowing happens. Arabs living in this part of Iran probably have immigrated in the early centuries of Islam. In thi...

متن کامل

Reducing out-of-vocabulary in morphology to improve the accuracy in Arabic dialects speech recognition

This thesis has two aims: developing resources for Arabic dialects and improving the speech recognition of Arabic dialects. Two important components are considered: Pronunciation Dictionary (PD) and Language Model (LM). Six parts are involved, which relate to finding and evaluating dialects resources and improving the performance of systems for the speech recognition of dialects. Three resource...

متن کامل

YAMAMA: Yet Another Multi-Dialect Arabic Morphological Analyzer

In this paper, we present YAMAMA, a multi-dialect Arabic morphological analyzer and disambiguator. Our system is almost five times faster than the state-of-the-art MADAMIRA system with a slightly lower quality. In addition to speed, YAMAMA outputs a rich representation which allows for a wider spectrum of use. In this regard, YAMAMA transcends other systems, such as FARASA, which is faster but ...

متن کامل

Building a Shallow Arabic Morphological Analyser in One Day

The paper presents a rapid method of developing a shallow Arabic morphological analyzer. The analyzer will only be concerned with generating the possible roots of any given Arabic word. The analyzer is based on automatically derived rules and statistics. For evaluation, the analyzer is compared to a commercially available Arabic Morphological Analyzer.

متن کامل

Graphone Model Interpolation and Arabic Pronunciation Generation

This paper extends n-gram graphone model pronunciation generation to use a mixture of such models. This technique is useful when pronunciation data is for a specific variant (or set of variants) of a language, such as for a dialect, and only a small amount of pronunciation dictionary training data for that specific variant is available. The performance of the interpolated ngram graphone model i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012